Treebanks Gone Bad: Generating a Treebank of Ungrammatical English
نویسنده
چکیده
This paper describes how a treebank of ungrammatical sentences can be created from a treebank of well-formed sentences. The treebank creation procedure involves the automatic introduction of frequently occurring grammatical errors into the sentences in an existing treebank, and the minimal transformation of the analyses in the treebank so that they describe the newly created ill-formed sentences. Such a treebank can be used to test how well a parser is able to ignore grammatical errors in texts (as people can), and can be used to induce a grammar capable of analysing such sentences. This paper also demonstrates the first of these uses.
منابع مشابه
Treebanks Gone Bad Parser Evaluation and Retraining using a Treebank of Ungrammatical Sentences
This article describes how a treebank of ungrammatical sentences can be created from a treebank of well-formed sentences. The treebank creation procedure involves the automatic introduction of frequently occurring grammatical errors into the sentences in an existing treebank, and the minimal transformation of the original analyses in the treebank so that they describe the newly created ill-form...
متن کاملتولید درخت بانک سازهای زبان فارسی به روش تبدیل خودکار
Treebanks is one of important and useful resource in Natural Language Processing tasks. Dependency and phrase structures are two famous kinds of treebanks. There have already made many efforts to convert dependency structure to phrase structure. In this paper we study an approach to convert dependency structure to phrase structure because of lack of a big phrase structure Treebank in Persian. A...
متن کاملHow Bad Is The Problem Of PP-Attachment? A Comparison Of English, German And Swedish
The correct attachment of prepositional phrases (PPs) is a central disambiguation problem in parsing natural languages. This paper compares the baseline situation in English, German and Swedish based on manual PP attachments in various treebanks for these languages. We argue that cross-language comparisons of the disambiguation results in previous research is impossible because of the different...
متن کاملHow bad is the problem of PP-attachment?
The correct attachment of prepositional phrases (PPs) is a central disambiguation problem in parsing natural languages. This paper compares the baseline situation in English, German and Swedish based on manual PP attachments in various treebanks for these languages. We argue that cross-language comparisons of the disambiguation results in previous research is impossible because of the different...
متن کاملTowards Building Parallel Dependency Treebanks: Intra-Chunk Expansion and Alignment for English Dependency Treebank
The paper presents our work on the annotation of intra-chunk dependencies on an English treebank that was previously annotated with Inter-chunk dependencies, and for which there exists a fully expanded parallel Hindi dependency treebank. This provides fully parsed dependency trees for the English treebank. We also report an analysis of the inter-annotator agreement for this chunk expansion task...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2006